10 research outputs found

    Linked data in BGS and its potential in model fusion

    Get PDF
    The British Geological Survey has been conducting a pilot project into the use of Linked Data. Linked Data is a best practice for using the web to expose, share and connect pieces of data, information and knowledge. It facilitates connections between previously unrelated data, and lowers the barriers to linking data currently linked using other methods. In essence, linked data involves publishing snippets of information as independent ‘triples’, made up of a subject, a predicate and an object. A subject is referenced by a URI and can represent any resource: a person, organisation, concept, dataset, model, application etc. A predicate is a property or relationship assigned to the subject, and is also referenced as a URI. An object is the value of the property or object of the relationship; this may be a resource referenced as a URI or a literal value such as a number or text string. Data linkages come about because anyone can publish a statement about anyone else’s resources, and resource URIs for subjects and objects can be matched up. Data linkages are also enhanced because anyone can (and should where possible) re-use anyone else’s predicates, thereby using a common language to describe information. BGS’s pilot project is about to publish three of our major vocabularies (Lexicon of Named Rock Units, Geochronological timescale, Rock Classification Scheme) and our 625k 2D geological map in linked data form. We have added links between our resources and those defined in external linked data sources where possible, including DBPedia (a linked data version of Wikipedia), the Ordnance Survey and the BBC Wildlife Finder website. Further work is necessary to improve the links to parallel vocabulary schemes defined by international organisations. The benefit of linked data is that rather than an end-user having to do investigative work to uncover the syntax and semantics of disparate datasets in order to integrate them, data published according to the Linked Data recommendations provides this information up front in an unambiguous and instantly available form. The user will have all the information at hand to integrate the data in a logical and scientifically valid way. This presentation will speculate as to how this approach may be applied to enable models to communicate and exchange information at run-time, for example using an interoperable vocabulary for physical properties, spatial and temporal dimensions and methodologies. Linked data can also be used to describe a common vocabulary for model parameters and the relationships and dependencies between them, thereby exposing feedback mechanisms between separate models or algorithms

    The British Geological Survey Rock Classification Scheme, its representation as linked data, and a comparison with some other lithology vocabularies

    Get PDF
    Controlled vocabularies are critical to constructing FAIR (findable, accessible, interoperable, re-useable) data. One of the most widely required, yet complex, vocabularies in earth science is for rock and sediment type, or ‘lithology’. Since 1999 the British Geological Survey has used its own Rock Classification Scheme in many of its workflows and products including the national digital geological map. This scheme pre-dates others that have been published, and is deeply embedded in BGS’ processes. By publishing this classification scheme now as a Simple Knowledge Organisation System (SKOS) machine-readable informal ontology, we make it available for ourselves and third parties to use in modern semantic applications, and we open the future possibility of using the tools SKOS provides to align our scheme with other published schemes. These include the IUGS-CGI Simple Lithology Scheme, the European Commission INSPIRE Lithology Code List, the Queensland Geological Survey Lithotype Scheme, the USGS Lithologic Classification of Geologic Map Units, and Mindat.org. The BGS lithology classification was initially based on four narrative reports that can be downloaded from the BGS website, although it has been added to subsequently. The classification is almost entirely mono-hierarchical in nature and includes 3454 currently valid concepts in a classification 11 levels deep. It includes igneous rocks and sediments, metamorphic rocks, sediments and sedimentary rocks, and superficial deposits including anthropogenic deposits. The SKOS informal ontology built on it is stored in a triplestore and the triples are updated nightly by extracting from a relational database where the ontology is maintained. Bulk downloads and version history are available on github. The RCS concepts themselves are used in other BGS linked data, namely the Lexicon of Named Rock Units and the linked data representation of the 1:625 000 scale geological map of the UK. Comparing the RCS with the other published lithology schemes, all are broadly similar but show characteristics that reveal the interests and requirements of the groups that developed them, in terms of their level of detail both overall and in constituent parts. It should be possible to align the RCS with the other classifications, and future work will focus on automated mechanisms to do this, and possibly on constructing a formal ontology for the RCS

    Ontology alignment based on word embedding and random forest classification.

    Get PDF
    Ontology alignment is crucial for integrating heterogeneous data sources and forms an important component for realising the goals of the semantic web. Accordingly, several ontology alignment techniques have been proposed and used for discovering correspondences between the concepts (or entities) of different ontologies. However, these techniques mostly depend on string-based similarities which are unable to handle the vocabulary mismatch problem. Also, determining which similarity measures to use and how to effectively combine them in alignment systems are challenges that have persisted in this area. In this work, we introduce a random forest classifier approach for ontology alignment which relies on word embedding to discover semantic similarities between concepts. Specifically, we combine string-based and semantic similarity measures to form feature vectors that are used by the classifier model to determine when concepts match. By harnessing background knowledge and relying on minimal information from the ontologies, our approach can deal with knowledge-light ontological resources. It also eliminates the need for learning the aggregation weights of multiple similarity measures. Our experiments using Ontology Alignment Evaluation Initiative (OAEI) dataset and real-world ontologies highlight the utility of our approach and show that it can outperform state-of-the-art alignment systems

    Ontology alignment based on word embedding and random forest classification

    Get PDF
    Ontology alignment is crucial for integrating heterogeneous data sources and forms an important component for realising the goals of the semantic web. Accordingly, several ontology alignment techniques have been proposed and used for discovering correspondences between the concepts (or entities) of different ontologies. However, these techniques mostly depend on string-based similarities which are unable to handle the vocabulary mismatch problem. Also, determining which similarity measures to use and how to effectively combine them in alignment systems are challenges that have persisted in this area. In this work, we introduce a random forest classifier approach for ontology alignment which relies on word embedding to discover semantic similarities between concepts. Specifically, we combine string-based and semantic similarity measures to form feature vectors that are used by the classifier model to determine when concepts match. By harnessing background knowledge and relying on minimal information from the ontologies, our approach can deal with knowledge-light ontological resources. It also eliminates the need for learning the aggregation weights of multiple similarity measures. Our experiments using Ontology Alignment Evaluation Initiative (OAEI) dataset and real-world ontologies highlight the utility of our approach and show that it can outperform state-of-the-art alignment systems

    Taxonomic corpus-based concept summary generation for document annotation

    Get PDF
    Semantic annotation is an enabling technology which links documents to concepts that unambiguously describe their content. Annotation improves access to document contents for both humans and software agents. However, the annotation process is a challenging task as annotators often have to select from thousands of potentially relevant concepts from controlled vocabularies. The best approaches to assist in this task rely on reusing the annotations of an annotated corpus. In the absence of a pre-annotated corpus, alternative approaches suffer due to insufficient descriptive texts for concepts in most vocabularies. In this paper, we propose an unsupervised method for recommending document annotations based on generating node descriptors from an external corpus. We exploit knowledge of the taxonomic structure of a thesaurus to ensure that effective descriptors (concept summaries) are generated for concepts. Our evaluation on recommending annotations show that the content that we generate effectively represents the concepts. Also, our approach outperforms those which rely on information from a thesaurus alone and is comparable with supervised approaches

    New 3D models for BGS's Geology of Britain Viewer

    No full text
    The BGS Geology of Britain viewer now contains access to 5 example 3D geological models of the UK, which are Ingleborough, Ipswich, Isle of Wight, Thurrock and York

    Understanding the ‘lived experience’ of unaccompanied young women: challenges and opportunities for social work

    No full text
    This article reports a small-scale piece of qualitative research, undertaken within a specialist social work team for asylum-seeking or trafficked young people, in a rural local authority in the UK. Following a multi-perspectival approach, it utilised both discourse theory and psycho-social theory to describe how social workers were drawing both on ways of talking and ways of feeling in their constructions of young women. The research concluded that a relationship-based model of social work is essential to prevent practitioners from falling back onto generalised social discourses or unconsidered emotional responses. It is further argued that social workers involved with this group of young people need access to specialist sources of training and knowledge outside of their organisation, and that building national and international links between practitioners in this field could further strengthen practice

    Retrotransposons Are the Major Contributors to the Expansion of the Drosophila ananassae Muller F Element

    No full text
    The discordance between genome size and the complexity of eukaryotes can partly be attributed to differences in repeat density. The Muller F element (∼5.2 Mb) is the smallest chromosome in Drosophila melanogaster, but it is substantially larger (>18.7 Mb) in D. ananassae. To identify the major contributors to the expansion of the F element and to assess their impact, we improved the genome sequence and annotated the genes in a 1.4-Mb region of the D. ananassae F element, and a 1.7-Mb region from the D element for comparison. We find that transposons (particularly LTR and LINE retrotransposons) are major contributors to this expansion (78.6%), while Wolbachia sequences integrated into the D. ananassae genome are minor contributors (0.02%). Both D. melanogaster and D. ananassae F-element genes exhibit distinct characteristics compared to D-element genes (e.g., larger coding spans, larger introns, more coding exons, and lower codon bias), but these differences are exaggerated in D. ananassae. Compared to D. melanogaster, the codon bias observed in D. ananassae F-element genes can primarily be attributed to mutational biases instead of selection. The 5′ ends of F-element genes in both species are enriched in dimethylation of lysine 4 on histone 3 (H3K4me2), while the coding spans are enriched in H3K9me2. Despite differences in repeat density and gene characteristics, D. ananassae F-element genes show a similar range of expression levels compared to genes in euchromatic domains. This study improves our understanding of how transposons can affect genome size and how genes can function within highly repetitive domains
    corecore